-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Support approximate related locations #19943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d065479
to
75b4c30
Compare
75b4c30
to
4a2d795
Compare
The code LGTM, and DCA looks good as far as I can tell. I inspected join orders manually by editing the following into - ["/home/runner/work/bulk-builder/bulk-builder/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java", 296, 296]
- ["/home/runner/work/bulk-builder/bulk-builder/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java", 10000, 109999] This alert filter marks line 296, which contains a source for the query, and 100,000 unrelated (non-existent) lines just to see some big tuple counts. Note: the line number 296 may be out of date. We get the following tuple counts for
There is a slight problem where we get 300,003 tuples, which is the number of sources in the file times the number of lines in the diff range in that file. To avoid that risk of blow-up, I propose calling restrictAlertsToStartLine(pragma[only_bind_into](filePath), [locStartLine .. locEndLine]) Now the join order becomes more sensible, with the tuple numbers increasing only slightly to allow for multi-line sources:
|
private predicate restrictAlertsToEntireFile(string filePath) { restrictAlertsTo(filePath, 0, 0) } | ||
|
||
pragma[nomagic] | ||
private predicate restrictAlertsToStartLine(string filePath, int line) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Despite the name, this predicate does not seem to treat the start line in any special way. Perhaps it can be renamed to restrictAlertsToLine
?
It's better to join with the range expression first since that will only multiply tuple counts by the number of lines in an average source/sink. Joining with `restrictAlertsToStartLine` first would multiply tuple counts by the number of sources/sinks in a given file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following QL doc for restrictAlertsTo
needs to be updated:
and its start line is between `startLineStart` and `startLineEnd`, inclusive. (Note that only start line of the location is used for matching because an alert is displayed on the first line of its location.)
E.g. to
and its line span is between `startLineStart` and `startLineEnd`, inclusive.
The documentation is now up-to-date with the new and more relaxed rules that allow overapproximating the results. I have also attempted to make a clearer distinction between the requirements of the specification and the behaviour of the implementation.
I can see the docs were confusing because they had not been kept up to date with the evolution of the requirements. I've now rearranged them and tried to distinguish clearly between the spec of diff-informed queries in |
Some queries could not be made diff-informed because they select an entity whose location is not a
Location
. Specifically this is true for the RegExpTerms inside a regular expression.This PR addresses this by adding the ability to filter based on a location containing the final selected location. We then filter based on the location of the entire RegExp literal, which is valid because it encloses the location of all its terms.
Unfortunately the Ruby query still can't be made diff-informed because the locations of its RegExpTerms aren't correct when the regexp is parsed from a string arising from constant folding. But it fixes it for Python and Java.
Initially in this PR the approximate location filtering was opt-in via a separate API, but after discussing with @jbj it is then made the default in a later commit.